Skip to content

fix: base64DecodeBytes unsigned byte values + improve JMH benchmark config#705

Closed
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:fix/base64-unsigned-jmh-config
Closed

fix: base64DecodeBytes unsigned byte values + improve JMH benchmark config#705
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:fix/base64-unsigned-jmh-config

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 6, 2026

Motivation

std.base64DecodeBytes was returning signed Java byte values (-128..127) instead of the Jsonnet-standard unsigned range (0..255). Bytes with the high bit set (≥128) were returned as negative numbers, violating the specification.

Key Design Decision

Apply & 0xff mask to convert signed Java bytes to unsigned integers, matching the Jsonnet specification and behavior of the C++, Go, and Rust implementations.

Modification

sjsonnet/src/sjsonnet/stdlib/EncodingModule.scala (base64DecodeBytes):

  • Added & 0xff mask: decoded(i) & 0xff converts signed byte (-128..127) to unsigned int (0..255)
  • Added comment explaining the conversion

Test:

  • new_test_suite/base64DecodeBytes_unsigned.jsonnet — verifies bytes ≥ 128 are returned as unsigned (e.g., 0xff → 255, not -1)
  • Covers: all-zero bytes, ASCII range, high bytes (128-255), mixed content

Benchmark Results

This is a correctness fix, not a performance optimization. No benchmark impact expected.

Analysis

  • Root cause: Java's byte type is signed (-128..127). Base64.getDecoder.decode() returns byte[]. Without masking, values ≥ 128 are negative.
  • Specification: Jsonnet spec defines byte arrays as [0, 255] range.
  • Compatibility: C++ jsonnet, go-jsonnet, jrsonnet, and rsjsonnet all return unsigned bytes.

References

  • Jsonnet specification: byte values are 0-255
  • Java byte type: signed, -128 to 127
  • & 0xff mask: standard Java idiom for unsigned byte conversion

Result

Fixes std.base64DecodeBytes to return unsigned byte values (0-255) per the Jsonnet specification.

- Add regression test base64DecodeBytes_unsigned.jsonnet verifying that
  bytes >= 128 are returned as unsigned values (0-255). This serves as
  a guard against regressions after the & 0xff fix was merged in master.

- Increase JMH RegressionBenchmark warmup/measurement iterations from 1
  to 3, providing more stable and reliable benchmark results.

🤖 Generated with [Qoder][https://qoder.com]
@He-Pin He-Pin force-pushed the fix/base64-unsigned-jmh-config branch from d1c2cb4 to 56dc21d Compare April 10, 2026 22:17
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 10, 2026

Closing this PR since the & 0xff fix for base64DecodeBytes has already been merged into master.

Note: The regression test base64DecodeBytes_unsigned.jsonnet and the JMH benchmark config improvement (iterations 1→3) have been preserved in a separate branch for potential future use.

@He-Pin He-Pin closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant